This document is intended to act as a primer for the use of the new
Neotoma R package, neotoma2. Some users may be working with
this document as part of a workshop for which there is a Binder
instance. The Binder instance will run RStudio in your browser, with all
the required packages installed.
If you are using this workflow on its own, or want to use the package
directly, the
neotoma2 package is available from GitHub and can be
installed in R using the devtools package by running:
devtools::install_github('NeotomaDB/neotoma2')
library(neotoma2)
This workshop will also require other packages. To maintain the flow of this document we’ve placed instructions at the end of the document in the section labelled “Installing packages on your own”.
In this tutorial you will learn how to:
If you’re planning on working with Neotoma, please join us on Slack where we manage a channel specifically for questions about the R package. You may also wish to join our Google Groups mailing list, please contact us to be added.
Data in the Neotoma database itself is structured as a set of linked relationships to express the different elements of paleoecological analysis: space and time, raw data, scientific methods and data models. Because paleoecology is such a broad discipline these relationships can be complex, and as such, the database itself is highly structured. If you want to better understand concepts within the database, you can read the Neotoma Database Manual, or take a look at the database schema itself.
In this workshop we want to highlight two key structural concepts:
neotoma2 R package.*The structure of sites, collection units and datasets within Neotoma. A site contains one or more collection units. Chronologies are associated with collection units. Data of a common type (pollen, diatoms, vertebrate fauna) are assigned to a dataset.
Data in Neotoma is associated with sites – specific locations with lat/long coordinates.
Within a site, there may be one or more collection
units – locations at which samples are physically collected
within the site. For example, an archaeological site
may have one or more collection units, pits within a
broader dig site; a pollen sampling site on a lake may
have multiple collection units – core sites within the
lake basin. Collection units may have higher resolution GPS locations
than the site location, but are considered to be part of the broader
site.
Within a collection unit data is collected at various [analysis units]. Any data sampled within an analysis unit is grouped by data type and aggregated into a sample. The set of samples for a collection unit of a particular data type is then assigned to a dataset.
neotoma2Neotoma R Package diagram. Each box represents a data
class within the package. Individual boxes show the class object, its
name, its properties, and functions that can be applied to those
objects. For example, a sites object has a property
sites, that is a list. The function
plotLeaflet() can be used on a sites
object.
If we look at the UML
diagram for the objects in the neotoma2 R package we
can see that the data structure generally mimics the structure within
the database itself. As we will see in the Site Searches section, we can search for
these objects, and begin to manipulate them (in the Simple Analysis section).
It is important to note: within the neotoma2 R
package, most objects are sites objects, they just contain
more or less data. There are a set of functions that can operate on
sites. As we add to sites objects, using
get_datasets() or get_downloads(), we are able
to use more of these helper functions.
get_sites()There are several ways to find sites in neotoma2, but we
think of sites as being spatial objects primarily. They
have names, locations, and are found within the context of geopolitical
units, but within the API and the package, the site itself does not have
associated information about taxa, dataset types or ages. It is simply
the container into which we add that information. So, when we search for
sites we can search by:
| Parameter | Description |
|---|---|
| sitename | A valid site name (case insensitive) using % as a
wildcard. |
| siteid | A unique numeric site id from the Neotoma Database |
| loc | A bounding box vector, geoJSON or WKT string. |
| altmin | Lower altitude bound for sites. |
| altmax | Upper altitude bound for site locations. |
| database | The constituent database from which the records are pulled. |
| datasettype | The kind of dataset (see get_tables(datasettypes)) |
| datasetid | Unique numeric dataset identifier in Neotoma |
| doi | A valid dataset DOI in Neotoma |
| gpid | A unique numeric identifier, or text string identifying a geopolitical unit in Neotoma |
| keywords | Unique sample keywords for records in Neotoma. |
| contacts | A name or numeric id for individuals associuated with sites. |
| taxa | Unique numeric identifiers or taxon names associated with sites. |
All sites in Neotoma contain one or more datasets. It’s worth noting that the results of these search parameters may be slightly unexpected. For example, searching for sites by sitename, latitude, or altitude will return all of the datasets for the particular site. Searching for terms such as datasettype, datasetid or taxa will return the site, but the only datasets returned will be those matching the dataset-specific search terms. We’ll see this later.
sitename="%Cheruvu%"We may know exactly what site we’re looking for (“Lac Mouton”), or have an approximate guess for the site name (for example, we know it’s something like “Lait Lake”, or “Lac du Lait”, but we’re not sure how it was entered specifically), or we may want to search all sites that have a specific term, for example, Cheruvu is the Telugu term for a lake or pond. If we are looking for lakes in a particular part of India we may want to search using this term.
We use the general format: get_sites(sitename="XXXXX")
for searching by name.
PostgreSQL (and the API) uses the percent sign as a wildcard. So
"%Cheruvu%" would pick up “Potapuram Cheruvu” for us
(and would pick up “Durgam Cheruvu” and “Pedda Cheruvu” if records for
these lakes existed). Note that the search query is also case
insensitive, so you could simply write "%cheruvu%".
pop_sites <- neotoma2::get_sites(sitename = "%Cheruvu%")
plotLeaflet(pop_sites)
loc=c()The neotoma package used a bounding box for locations,
structured as a vector of latitude and longitude values:
c(xmin, ymin, xmax, ymax). The neotoma2 R
package supports both this simple bounding box, but also more complex
spatial objects, using the sf package.
Using the sf package allows us to more easily work with
raster and polygon data in R, and to select sites from more complex
spatial objects. The loc parameter works with the simple
vector, WKT, geoJSON objects and native
sf objects in R.
Looking for sites using a location. We’re putting a rough
representations of a region stretching from East Africa through South
Asia. To work with this spatial object in R we also transformed the
geoJSON element to an object for the sf
package. There are many other tools to work with spatial objects in R.
Regardless of how you get the data into R, neotoma2 works
with almost all objects in the sf package.
geoJSON <- '{"coordinates":
[[
[20.59, 34.22],
[16.19, 30.57],
[17.75, 19.51],
[43.63, 10.59],
[85.47, 2.15],
[107.32, 8.90],
[107.39, 27.31],
[67.87, 37.84],
[20.59, 34.22]
]],
"type": "Polygon"}'
sa_sf <- geojsonsf::geojson_sf(geoJSON)
sa_sites <- neotoma2::get_sites(loc = sa_sf, all_data = TRUE)
You can always simply plot() the sites
objects, but you will lose some of the geographic context. The
plotLeaflet() function returns a leaflet()
map, and allows you to further customize it, or add additional spatial
data (like our original bounding polygon, sa_sf, which
works directly with the R leaflet package):
neotoma2::plotLeaflet(sa_sites) %>%
leaflet::addPolygons(map = .,
data = sa_sf,
color = "green")
If we look at the data
structure diagram for the objects in the neotoma2 R
package we can see that there are a set of functions that can operate on
sites. As we add to sites objects, using
get_datasets() or get_downloads(), we are able
to use more of these helper functions.
As it is, we can take advantage of functions like
summary() to get a more complete sense of the types of data
we have in sa_sites. The following code gives the summary
table. We do some R magic here to change the way the data is displayed
(turning it into a datatable() object), but the main piece
is the summary() call.
# Give information about the sites themselves, site names &cetera.
neotoma2::summary(sa_sites)
# Give the unique identifiers for sites, collection units and datasets found at those sites.
neotoma2::getids(sa_sites)
In this document we list only the first 10 records (there are more,
you can use length(sa_sites) to see how many datasets
you’ve got). We can see that there are no chronologies associated with
the site objects. This is because, at present, we have not
pulled in the dataset information we need. In Neotoma, a
chronology is associated with a collection unit (and that metadata is
pulled by get_datasets() or get_downloads()).
All we know from get_sites() are the kinds of datasets we
have and the location of the sites that contain the datasets.
We know that within Neotoma, collection units and datasets are
contained within sites. Similarly, a sites object contains
collectionunits which contain datasets. From
the table above we can see that some of the sites we’ve looked at
contain pollen records, some contain geochronologic data and some
contain other dataset types. We could write something like this:
table(summary(sa_sites)$types) to see the different
datasettypes and their counts.
With a sites object we can directly call
get_datasets(), to pull in more metadata about the
datasets. The get_datasets() method also supports any of
the search terms listed above in the Site
Search section. At any time we can use datasets() to
get more information about any datasets that a sites object
may contain. Compare the output of datasets(sa_sites) to
the output of a similar call using the following:
sa_datasets <- neotoma2::get_datasets(sa_sites, all_data = TRUE)
datasets(sa_datasets)
You can see that this provides information only about the specific
dataset, not the site! For a more complete record we can join site
information from summary() to dataset information using
datasets() using the getids() function which
links sites, and all the collection units and datasets they contain.
If we choose to pull in information about only a single dataset type,
or if there is additional filtering we want to do before we download the
data, we can use the filter() function. For example, if we
only want sedimentary pollen records (as opposed to pollen surface
samples), and want records with known chronologies, we can filter by
datasettype and by the presence of an
age_range_young, which would indicate that there is a
chronology that defines bounds for ages within the record.
sa_records <- sa_datasets %>%
neotoma2::filter(datasettype == "pollen" & !is.na(age_range_young))
neotoma2::summary(sa_records)
# We've removed records, so the new object should be shorter than the original.
length(sa_records) < length(sa_datasets)
We can see now that the data table looks different (comparing it to the table above), and there are fewer total sites. Again, there is no explicit chronology for these records, we need to pull down the complete download for these records, but we begin to get a sense of what kind of data we have.
sample() dataBecause sample data adds a lot of overhead (for this pollen data, the
object that includes the dataset with samples is 20 times larger than
the dataset alone), we try to call
get_downloads() after we’ve done our preliminary filtering.
After get_datasets() you have enough information to filter
based on location, time bounds and dataset type. When we move to
get_download() we can do more fine-tuned filtering at the
analysis unit or taxon level.
The following call can take some time, but we’ve frozen the object as an RDS data file. You can run this command on your own, and let it run for a bit, or you can just load the object in.
## This line is commented out because we've already run it for you.
## sa_dl <- sa_records %>% get_downloads(all_data = TRUE)
## saveRDS(sa_dl, "data/saDownload.RDS")
sa_dl <- readRDS("data/saDownload.RDS")
Once we’ve downloaded, we now have information for each site about all the associated collection units, the datasets, and, for each dataset, all the samples associated with the datasets. To extract all the samples we can call:
allSamp <- samples(sa_dl)
When we’ve done this, we get a data.frame that is 68823
rows long and 37 columns wide. The reason the table is so wide is that
we are returning data in a long format. Each row
contains all the information you should need to properly interpret
it:
## [1] "age" "agetype" "ageolder" "ageyounger"
## [5] "chronologyid" "chronologyname" "units" "value"
## [9] "context" "element" "taxonid" "symmetry"
## [13] "taxongroup" "elementtype" "variablename" "ecologicalgroup"
## [17] "analysisunitid" "sampleanalyst" "sampleid" "depth"
## [21] "thickness" "samplename" "datasetid" "database"
## [25] "datasettype" "age_range_old" "age_range_young" "datasetnotes"
## [29] "siteid" "sitename" "lat" "long"
## [33] "area" "sitenotes" "description" "elev"
## [37] "collunitid"
For some dataset types, or analyses some of these columns may not be
needed, however, for other dataset types they may be critically
important. To allow the neotoma2 package to be as useful as
possible for the community we’ve included as many as we can.
If you want to know what taxa we have in the record you can use the
helper function taxa() on the sites object. The
taxa() function gives us, not only the unique taxa, but two
additional columns, sites and samples that
tell us how many sites the taxa appear in, and how many samples the taxa
appear in, to help us better understand how common individual taxa
are.
neotomatx <- neotoma2::taxa(sa_dl)
Taxonomies in Neotoma are not as straightforward as we might expect. Taxonomic identification in paleoecology can be complex, impacted by the morphology of the object we are trying to identify, the condition of the palynomorph, the expertise of the analyst, and may other conditions. You can read more about concepts of taxonomy within Neotoma in the Neotoma Manual’s section on Taxonomic concepts.
We use the unique identifiers (e.g., taxonid,
siteid, analysisunitid) throughout the
package, since they help us to link between records. The
taxonid values returned by the taxa() call can
be linked to the taxonid column in the
samples() table. This allows us to build taxon
harmonization tables if we choose to. You may also note that the
taxonname is in the field variablename.
Individual sample counts are reported in Neotoma as variables.
A “variable” may be either a species, something like laboratory
measurements, or a non-organic proxy, like charcoal or XRF measurements,
and includes the units of measurement and the value.
Lets say we want all samples from which Quercus taxa have been reported to be grouped together into one pseudo-taxon called Quercus-undiff. NOTE, this is not an ecologically useful grouping, but used for illustration.
There are several ways of grouping taxa, either directly by exporting
the file and editing each individual cell, or by creating an external
“harmonization” table (which we did in the prior neotoma
package). First, lets look for how many different ways Quercus
appears in these records:
## # A tibble: 15 × 11
## # Groups: units, context, element, taxonid, symmetry, taxongroup,
## # elementtype, variablename, ecologicalgroup [15]
## units context element taxonid symme…¹ taxon…² eleme…³ varia…⁴ ecolo…⁵ samples
## <chr> <chr> <chr> <int> <lgl> <chr> <chr> <chr> <chr> <int>
## 1 NISP <NA> pollen 251 NA Vascul… pollen Quercus TRSH 173
## 2 NISP <NA> pollen 2342 NA Vascul… pollen Quercu… TRSH 245
## 3 NISP <NA> pollen 2532 NA Vascul… pollen Quercu… TRSH 252
## 4 NISP <NA> pollen 2929 NA Vascul… pollen Quercu… TRSH 15
## 5 NISP <NA> pollen 3592 NA Vascul… pollen Quercu… TRSH 206
## 6 NISP <NA> pollen 3593 NA Vascul… pollen Quercu… TRSH 8
## 7 NISP <NA> pollen 4713 NA Vascul… pollen Quercu… TRSH 196
## 8 NISP <NA> pollen 4716 NA Vascul… pollen Quercu… TRSH 606
## 9 NISP <NA> pollen 4717 NA Vascul… pollen Quercu… TRSH 158
## 10 NISP <NA> pollen 4721 NA Vascul… pollen Quercu… TRSH 2
## 11 NISP <NA> pollen 33668 NA Vascul… pollen Quercu… TRSH 47
## 12 NISP <NA> pollen 33676 NA Vascul… pollen Quercu… TRSH 52
## 13 NISP <NA> pollen 33691 NA Vascul… pollen Quercu… TRSH 392
## 14 NISP <NA> pollen 33692 NA Vascul… pollen Quercu… TRSH 158
## 15 NISP <NA> pollen 46157 NA Vascul… pollen Quercu… TRSH 170
## # … with 1 more variable: sites <int>, and abbreviated variable names
## # ¹symmetry, ²taxongroup, ³elementtype, ⁴variablename, ⁵ecologicalgroup
Programmatically, we can harmonize taxon by taxon using matching and
transformation. We’re using dplyr type coding here to
mutate() the column variablename so that any
time we detect (str_detect()) a variablename
that starts with Quercus (the .* represents a
wildcard for any character [.], zero or more times
[*]) we replace() it with the character string
"Quercus-undiff". Note that this changes Quercus
only in the allSamp object, not in any of the
downloaded objects. If we were to call samples() again, the
taxonomy would return to its original form.
We’re going to filter the ecological groups to include only TRSH (Trees & Shrubs). More information about ecological groups is available from the Neotoma Online Manual.
# Change all instances of a Quercus-type taxon in the samples table
# to a fixed name "Quercus-undiff"
allSamp <- allSamp %>%
dplyr::filter(ecologicalgroup == "TRSH") %>%
mutate(variablename = replace(variablename,
stringr::str_detect(variablename, "Quercus.*"),
"Quercus-undiff"))
There were originally 15 different taxa identified as being within the genus Quercus (including Quercus., Quercus robur-type, and Quercus coccifera). The above code reduces them all to a single taxonomic group Quercus-undiff.
If we want to have an artifact of our choices, we can use an external table. For example, a table of pairs (what we want changed, and the name we want it replaced with) can be generated, and it can include regular expressions (if we choose):
| original | replacement |
|---|---|
| Pinus.* | Pinus-undiff |
| Picea.* | Picea-undiff |
| Tamarindus.* | Tamarindus-undiff |
| Quercus.* | Quercus-undiff |
| … | … |
We can get the list of original names directly from the
taxa() call, applied to a sites object that
contains samples, and then export it using write.csv().
taxaplots <- taxa(sa_dl)
# Save the taxon list to file so we can edit it subsequently.
readr::write_csv(taxaplots, "data/mytaxontable.csv")
Figure. A plot of the number of sites a taxon appears in, against the number of samples a taxon appears in.
The plot is mostly for illustration, but we can see, as a sanity check, that the relationship is as we’d expect. There are a large number of taxa that are rarely present, and then several that are quite common.
You can then export either one of these tables and add a column with
the counts, you could also add extra contextual information, such as the
ecologicalgroup or taxongroup to help you out.
Once you’ve cleaned up the translation table you can load it in, and
then apply the transformation:
translation <- readr::read_csv("data/taxontable.csv")
You can see we’ve changed some of the taxon names in the taxon table
(don’t look too far, I just did this as an example). To replace the
names in the samples() output, we’ll join the two tables
using an inner_join() (meaning the
variablename must appear in both tables for the result to
be included), and then we’re going to select only those elements of the
sample tables that are relevant to our later analysis:
allSamp <- samples(sa_dl)
allSamp <- allSamp %>%
inner_join(translation, by = c("variablename" = "variablename")) %>%
dplyr::select(!c("variablename")) %>%
group_by(siteid, sitename, harmonizedname,
sampleid, units, age,
agetype, depth, datasetid,
long, lat) %>%
summarise(value = sum(value), .groups='keep')
We now have a cleaner set of taxon names compared to the original
table, both because of harmonization, and because we cleared out many of
the non TRSH taxa from the harmonization table (the
original samples() table is below).
We can use packages like rioja to do stratigraphic
plotting for a single record, but first we need to do some different
data management. Although we could do harmonization again we’re going to
simply take the top ten most common taxa at a single site and plot them
in a stratigraphic diagram.
We’re using the arrange() call to sort by the number of
times that the taxon appears within the core. This way we can take out
samples and select the taxa that appear in the first ten rows of the
plottingTaxa data.frame.
# Get a particular site, in this case we are simply subsetting the
# `sa_dl` object:
plottingSite <- sa_dl[[1]]
# Select only pollen measured using NISP and convert to a "wide"
# table, using proportions. The first column will be "age".
# This turns our "long" table into a "wide" table:
counts <- plottingSite %>%
samples() %>%
toWide(ecologicalgroup = c("TRSH"),
unit = c("NISP"),
elementtypes = c("pollen"),
groupby = "age",
operation = "prop")
Hopefully the code is pretty straightforward. The
toWide() function provides you with significant control
over the taxa, units and other elements of your data before you get them
into the wide matrix (depth by taxon) that
most statistical tools such as the vegan package or
rioja use.
To plot the data we can use rioja’s
strat.plot(), sorting the taxa using weighted averaging
scores (wa.order). I’ve also added a CONISS plot to the
edge of the the plot, to show how the new wide data frame works
with distance metric funcitons.
# Perform constrained clustering:
clust <- rioja::chclust(dist(sqrt(counts)),
method = "coniss")
# Plot the stratigraphic plot, converting proportions to percentages:
plot <- rioja::strat.plot(counts[,-1] * 100, yvar = counts$age,
title = sa_dl[[1]]$sitename,
ylabel = "Calibrated Years BP",
xlabel = "Pollen (% of Trees and Shrubs)",
y.rev = TRUE,
clust = clust,
wa.order = "topleft",
scale.percent = TRUE)
rioja::addClustZone(plot, clust, 4, col = "red")
We now have site information across our region, from East Africa to South East Asia, with samples, and with taxon names. I’m interested in looking at the distributions of taxa across time, and the proportion of their presence/absence. I’m going to pick the top 20 taxa (based on the number of times they appear in the records) and look at their distributions in time. We’re going to use our harmonization table again, to clean up the taxon names.
# Harmonize the sample names as above.
# Note that when we group we are treating `age` in a special way.
# If we multiply by 2, round to the nearest Thousandth, and then divide
# by two, we are effectively putting the data into 500 year bins:
# First, get the number of sites at which each taxon appears, within each
# 500 year bin:
taxaSitesByAge <- samples(sa_dl) %>%
inner_join(translation, by = c("variablename" = "variablename")) %>%
dplyr::select(!c("variablename")) %>%
group_by(harmonizedname,
"age" = round(age * 2, -3) / 2) %>%
summarise(n = length(unique(siteid)), .groups = 'keep')
# Then get the total number of sites sampled within each 500 year bin:
samplesByAge <- samples(sa_dl) %>%
inner_join(translation, by = c("variablename" = "variablename")) %>%
dplyr::select(!c("variablename")) %>%
group_by("age" = round(age * 2, -3) / 2) %>%
summarise(samples = length(unique(siteid)), .groups = 'keep')
# Now get the proportion of sites at which each taxon appears:
groupbyage <- taxaByAge %>%
inner_join(samplesByAge, by = "age") %>%
mutate(proportion = n / samples)
# These lines of code give us the most common taxa
# in this dataset (their count of samples is in the top 5%)
mostCommon <- taxaSitesByAge %>%
group_by(harmonizedname) %>%
summarise(count = sum(n)) %>%
filter(count > quantile(count, 0.95))
# The last thing to do is to select only some of the taxa:
subsetTaxa <- groupbyage %>%
filter(harmonizedname %in% mostCommon$harmonizedname)
# And then plot!
ggplot(subsetTaxa,
aes(x = age, y = proportion)) +
geom_point() +
geom_smooth(method = 'gam') +
facet_wrap(~harmonizedname) +
coord_cartesian(xlim = c(20000, 0), ylim = c(0, 1)) +
scale_x_reverse(breaks = c(10000, 20000)) +
xlab("Proportion of Sites with Taxon") +
theme_bw()
We can see clear patterns of change, and the smooths are modeled
using Generalized Additive Models (GAMs) in R, so we can have more or
less control over the actual modeling using the gam or
mgcv packages. Depending on how we divide the data we can
also look at shifts in altitude, latitude or longitude to better
understand how species distributions and abundances changed over time in
this region.
We are often interested in the interaction between taxa and climate,
assuming that time is a proxy for changing environments. The development
of large-scale global datasets for climate has made it relatively
straightforward to access data from the cloud in raster format. R
provides a number of tools (in the sf and
raster packages) for managing spatial data, and providing
support for spatial analysis of data.
The first step is taking our sample data and turning it into a
spatial object using the sf package in R:
modern <- samples(sa_dl) %>%
filter(age < 1000) %>%
filter(ecologicalgroup == "TRSH" & elementtype == "pollen" & units == "NISP")
spatial <- sf::st_as_sf(modern,
coords = c("long", "lat"),
crs = "+proj=longlat +datum=WGS84")
The data is effectively the same, sf makes an object
called spatial that is a data.frame with all
the information from samples(), and a column
(geometry) that contains the spatial data.
We can use the getData()
function in the raster package to get climate data from
WorldClim. The operations that follow here can be applied to any sort of
raster data, provided it is loaded into R as a raster
object.
Here we pull in the raster data, at a 10 minute resolution for the
\(T_{max}\) variable, maximum monthly
temperature. The raster itself has 12 layers, one for each month. With
the extract() function we just get information for the
seventh month, July.
worldTmax <- raster::getData('worldclim', var = 'tmax', res = 10)
spatial$tmax7 <- raster::extract(worldTmax, spatial)[,7]
This adds a column to the data.frame
spatial, that contains the maximum July temperature for
each taxon at each site (all taxa at a site will share the same value).
We’ve already filtered to all UPHE taxa, but that still leaves us with 1
distinct names for the taxa. We’re going to use dplyr’s
mutate() function to extract just the genus:
spatial <- spatial %>%
mutate(variablename = stringr::str_replace(variablename, "[[:punct:]]", " ")) %>%
mutate(variablename = stringr::word(variablename, 1)) %>%
group_by(variablename, siteid) %>%
summarise(tmax7 = max(tmax7), .groups = "keep") %>%
group_by(variablename) %>%
filter(n() > 3)
We want to get the background distribution of July temperatures in South America, to plot our taxon distributions against by taking the maximum value of the temperature, however, since all values at the site are the same (because we used a spatial overlay) the maximum is the same as the actual July temperature at the site.
maxsamp <- spatial %>%
dplyr::group_by(siteid) %>%
dplyr::summarise(tmax7 = max(tmax7), .groups = 'keep')
Now we’re going to plot it out, using facet_wrap() to
plot each taxon in its own panel:
ggplot() +
geom_density(data = spatial,
aes(x = round(tmax7 / 10, 0)), col = 2) +
facet_wrap(~variablename) +
geom_density(data = maxsamp, aes(x = tmax7 / 10)) +
xlab("Maximum July Temperature") +
ylab("Kernel Density")
So, we’ve done a lot in this example. We’ve (1) searched for sites using site names and geographic parameters, (2) filtered results using temporal and spatial parameters, (3) obtained sample information for the selected datasets and (4) performed basic analysis including the use of climate data from rasters. Hopefully you can use these examples as templates for your own future work, or as a building block for something new and cool!
We use several packages in this document, including
leaflet, sf and others. We load the packages
using the pacman package, which will automatically install
the packages if they do not currently exist in your set of packages.
options(warn = -1)
pacman::p_load(neotoma2, dplyr, ggplot2, sf, geojsonsf, leaflet, terra, DT)
Note that R is sensitive to the order in which packages are loaded.
Using neotoma2:: tells R explicitly that you want to use
the neotoma2 package to run a particular function. So, for
a function like filter(), which exists in other packages
such as dplyr, you may see an error that looks like:
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "sites"
In that case it’s likely that the wrong package is trying to run
filter(), and so explicitly adding dplyr:: or
neotoma2:: in front of the function name (i.e.,
neotoma2::filter())is good practice.